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Abstract 

We show a strict hierarchy among various edge and vertex expansion properties of Markov chains. 
This gives easy proofs of a range of bounds, both classical and new, on chi-square distance, spectral 
gap and mixing time. The 2-gradient is then used to give an isoperimetric proof that a random 
walk on the grid [k] n mixes in time 0*(k 2 n). 
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1 Introduction 

Markov chain algorithms have been used to solve a variety of previously intractable approximation 
problems. These have included approximating the permanent, estimating volume, counting contin- 
gency tables, and studying stock portfolios, among others. In all of these cases a critical point has 
been to show that a Markov chain is rapidly mixing, that is, within a number of steps polynomial in 
the problem size the Markov chain approaches a stationary (usually uniform) distribution tt. 

Intuitively, a random walk on a graph (i.e., a Markov chain) is rapidly mixing if there are no 
bottlenecks. This isoperimetric argument has been formalized by various authors. Jerrum and Sinclair 
[6] showed rapid mixing occurs if and only if the underlying graph has sufficient edge expansion, also 
known as high conductance. Lovasz and Kannan [8] showed that the mixing is faster if small sets have 
larger expansion. Kannan, Lovasz and Montenegro [7] and Morris and Peres [11] extended this and 
showed that the mixing is even faster if every set also has a large number of boundary vertices, i.e., 
good vertex expansion. 

In a separate paper [?] the present author has shown that the extensions of [8, 7, 11] almost 
always improve on the bounds of [6], by showing that standard methods used to study conductance 
- via geometry, induction or canonical paths - can be extended to show that small sets have higher 
expansion or that there is high vertex expansion. This typically leads to bounds on mixing time that 
are within a single order of magnitude from optimal. However, none of these methods fully exploit 
the results of [7, 11], as each involves only two of three properties: edge expansion, vertex expansion 
and conditioning on set size. 

Before introducing our results, let us briefly discuss the measures of set expansion / congestion 
that are used in [7, 11]. Note that for the remainder of the paper congestion and bottleneck mean that 
there are either few edges from a set A to its complement, or that there are few boundary vertices, i.e. 
either edge or vertex expansion is poor. Kannan et. al. developed blocking conductance bounds on 
mixing for three measures of congestion. The spread tp + (x) measures the worst congestion for sets of 
sizes in [x/2, x], so if there are bottlenecks at small set sizes but not at larger ones then this is a good 
measure to use. In contrast, the modified spread ijj m od(x) measures the worst congestion among sets 
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of all sizes < x, but comes with stronger mixing bounds, so for a "typical" case where the congestion 
gets worse as set size increases then this is best. The third measure, global spread ip g i{x) measures a 
weighted congestion among sets of sizes < x which is best only if the Markov chain has extremely low 
congestion at small sets. Finally, Morris and Peres' evolving sets uses a different measure ip evo (x) of 
the worst congestion among sets of sizes < x. Their method comes with very good mixing time bounds 
in terms of ip evo (x), and because it bounds the stronger chi-square distance it also implies bounds on 
the spectral gap. However, it is unclear how the size of ip evo (%) compares with the three congestion 
measures of blocking conductance and hence we do not know which method is best unless all four of 
the congestion measures are computed, a non-trivial task. 

We begin by showing that the spread ip + lower bounds the evolving sets quantity ip e vo of Morris 
and Peres [11]. This implies a non-reversible form of ip + , as well as lower bounds on the spectral gap 
and on chi-square distance. The other forms of blocking conductance are found to upper bound i[) evo 
and are more appropriate for total variation distance. Moreover, an "optimistic" form of the spread 
turns out to upper bound the spectral gap and lower bound total variation mixing time, although this 
form is not useful in practice. 

Houdre and Tetali [5], in the context of concentration inequalities, considered the discrete gradients 
h+(x), a family which involves all three properties of the new mixing methods - edges, vertices and set 
size - with p = 1 measuring only edges, p = 2 weighting edges and vertices roughly equally, and p = oo 
measuring only vertices. In this paper it is shown that the spread function ip + (x) is closely bounded 
both above and below by hp~(x). It is found that various classical isoperimetric bounds on mixing time 
and spectral gap are essentially the best lower bound approximations to the quantity h\(x) 2 j2. The 
hf(l/2) approximation is the theorem of Jerrum and Sinclair, hf(x) leads to the average conductance 
of Lovasz and Kannan, fo+ (1/2) gives a mixing time bound of Alon [2], and h^x) gives a bound 
shown by Morris and Peres [11] and in a weaker form by this author [10]. 

Of these various bounds the one that is the most relevant to our purposes is h^ix), since this is 
weighted equally between edge and vertex isoperimetry. In order to give an application of our methods 
we show how two additional isoperimetric quantities, Bobkov's constant bp~ and Murali's (3 + [12], can 
be used to bound h^ix) for products of Markov chains. We apply this to prove a lower bound on 
h^ix) for a random walk on the grid [k] n . This leads to a mixing time bound of 0(k 2 n log 2 n), the 
first isoperimetric proof of the correct r = O* (k 2 n) for this Markov chain. 

The paper proceeds as follows. In Section 2 we introduce notation. Section 3 shows the connection 
between spread, evolving sets and spectral gap. Section 4 gives results on the discrete gradients, 
including sharpness. Section 5 finishes the paper with the isoperimetric bound on the grid [k] n . 

2 Preliminaries 

A finite state Markov chain M. is given by a state space K with cardinality \K\ = n, and the transition 
probability matrix, an n x n square matrix P such that Pjj G [0,1] and Vi G K : YljeK^ij = 1- 
Probability distributions on K are represented by 1 x n row vectors, so that if the initial distribution 
is p(°) then the t-step distribution is given by = p(°) P*. 

The Markov chains considered here are irreducible (Vz, j G K, 3t : (P*)y > 0) and aperiodic 
(yi : gcd{t : (P*)n > 0} = 1). Under these conditions there is a unique stationary distribution ir such 
that ir P = 7r, and moreover the Markov chain is ergodic (Vi, j £ K : lim t _ >0O (P*)jj = ttj). All Markov 
chains in this paper are lazy (Vi G K : Pa > ^); lazy chains are obviously aperiodic. 

The time reversal of a Markov chain A4 is the Markov chain with transition probabilities P (u, v) = 
ir(v)P(v,u)/n(u), and has the same stationary distribution ir as the original Markov chain. It is often 
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easier to consider time reversible Markov chains (Vi, j £ K : 7r(z) P(i,j) = n(j) P(j,i)). In the time 
reversible case P = P and the reversal is just the original Markov chain. 

The distance of from it is measured by the LP distance || • Hlp^), which for p > 1 is given by 

v 

tt(v) . 

veK 

The total variation distance is || • \\tv = \ 1 1 • IIl 1 ^); an d the x 2 '-distance is || • || x 2(„.) = || • ll^^)- 

The mixing time measures how many steps it takes a Markov chain to approach the stationary 
distribution, 



ip (t) -<, w = E 



r(e) = maxmin 1 1 : ||p^ — tt\\tv — e p 
X 2 (e) = maxmin |t : ||p W - vr|| x 2 (7r) < e| . 



Cauchy- Schwartz shows that 2 || • \\tv < II ■ II^tt)' horn which it follows that r(e) < x 2 (4e 2 ). 
Morris and Peres showed a nice fact about general (non-reversible) Markov chains [11] 



pn+m 



max -22L - - 1 < ||P»(x, •) - ttII^, || P m (z, ■) - tt 



1/2 II p mi „ \ ^ll 1 / 2 



and so chi-square mixing can be used to show small relative pointwise distance (•) /tt(-). This makes 
chi-square mixing a stronger condition than total variation mixing. 

The ergodic flow between two points i,j € K is q(i,j) = Hi^ij and the flow between two sets 
A, C C K is Q(A,C) = EieAg(i,i). In fact Q(A,A C ) = Q{A C ,A), where A c := K \ A. 

The continuization K of -K" is defined as follows. Let if = [0, 1] and to each point v e K assign an in- 
terval I v = [a, b] C [0, 1] = K with b — a = tt(v), so that m(I x r\I y ) = if x ^ y, and [0, 1] is the union of 
these intervals. Then if A, B C K define tt{A) = m{A) and Q(A, B) = Y.x, y aK "^ff "^(^ y) 
for Lebesgue measure m. This is consistent with the definition of ergodic flow between sets in K. The 
continuization K is somewhat awkward but will be needed in our work, particularly for ip + (A) below 
and in the next few sections. 

Various isoperimetric quantities have been used to upper bound r(e) and x 2 ( e )- A few of them are 
listed below. Unless explicitly stated, all sets both here and later in the paper will be in K, not K. 

Q(A A c ) 

Conductance^, 8] : $(x) = min , §(A) = v .' 1 , $ = $(1/2), 

w(A)<x K{A) 

i $>(t, a c ) 

Spread\7] : h (x) = sup — ■ , , — where ib (A) = / — , ' n dt, 

Ack, x^j+(A) J tt(A) 2 



x/2<tt(A)<x 



1 f 1 ^(t, A c 

Modified Spread[7] : h mod (x) = sup — - — -- where ip mo d(A) = — 

A f> xip mod (A) J TT 



ack, 

ir(A)<x 



i r 1 *( 

Global Spread[7] : h„i(x) = sup — — — where i/j q i(A) = / — 

ArK A A )Wal{A) " J Q 7T 



(A)t# 

1 V(t,A 



dt, 



ACK, < A )^A A ) JO < A f 

n(A)<x 



dt, 



Evolving Set s[ll] : ip evo (x) = inf ip evo (A) where ip evo (A) = 1 — / W — —^—du, 
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and fy(t,A c 



where A u = {y £ K : P (y, A) > u} 

r mix DBcA Q(B,A c ) ift<7r(A) 

< w(B)=t 
V(l-t,A) ift>7T(A) 

and i# = min{t, 1 — t}. 

Properties of the various Y> quantities can be found in the introduction and in the following section. 

The infinum in the definition of *S>(t, A c ) is achieved for some set B C K when the Markov chain 
is finite. For instance, when t < tt(A) then one construction for B is as follows: order the points 
v £ A in increasing order of P(v,A c ) as vi, t>2, . . . , v n , add to B the longest initial segment B m := 
{v±, V2, ■ ■ ■ , v m } of these points with ir(B m ) < t, and for the remainder of B take t — ir(B m ) < tt(v„ 1+ i) 
units of v m+ i. 

The quantities A u and ^f(t, A c ) are closely related. For time-reversible Markov chains, if t = 
tt(A u ) < it (A) then A u is the set of size t with the highest flow into A, so the smallest flow into A c , 
and therefore ^(t,A c ) = Q(A U ,A C ). Similarly, when t = tt(A u ) > n(A) then ^(t,A c ) = Q(A C U ,A). 
The choice of ir(A u ) or *ff(t,A c ) is similar to the choice of Lebesgue or Riemann integral, where the 
Lebesgue-like ir(A u ) measures the amount of K with transition probability to A above a certain level, 
while ^f(t, A c ) is more Riemanndike in simply integrating along P once P has been put in increasing 
order. 

The quantities <& and ip evo (A) can be used to upper bound x 2 ( e )i while $>(A), tp + (A) and 
ip g l/mod(A) are used to upper bound r(e). In the r(e) case it suffices to bound r(l/4), because 
t{c) < T (l/4) log 2 (l/e) [1]. The two bounds of most interest here are: 

Theorem 2.1. If M. is a (lazy, aperiodic, ergodic) Markov chain then 

,1/2 \ 

[7\ r(l/4) < 8 - 1376 1/ h(x) dx + h(l/2)\ 



\Jtt 



7T 

x l 2 dx log(8/e) 



Vw>(l/2) 



where ttq = min ve K tt(v) and h(x) indicates h + (x), h moc i(x) or h g i(x). The h(x) bounds apply to 
reversible Markov chains only, whereas the ip e vo(x) bound applies even in the non-reversible case. 

3 Spread, \ 2 an d the Spectral Gap 

In this section we show a connection between the spread function and evolving sets. We further 
explore this connection by finding that variations on the spread function both upper and lower bound 
the spectral gap. The connection to evolving sets implies a mixing time theorem with much stronger 
constants, as well as a non-reversible result. 

Theorem 3.1. If M is a lazy Markov chain and A is a subset of the state space with tt(A) < 1/2, 
then let the (time reversed) spread function ip + (A) be given by 

<_ M A ) m(t A c ) <— 

^ + (A)= \' > dt where ^(t,A c )= inf Q(A C ,B) , 

J0 n{Ay KdBcA, 

w(B)=t 



with tp g i(A) and ipmod(A) defined similarly, and where ip + (x) = ini 7T ^ <x ip + (A). Then 

4>gl(A) > ^mod(A) > ^evo(A) > ^ + (A) , 
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and in particular, 



r(e) <x 2 (4e 2 ) < { 




7T X 

2 „ , 



+ 



4 log(2/e 2 ) 
V + (l/2) 



U + (l/2) 



(log(l/7T ) + 2 log(l/2e)) . 



Observe that ip + is just ?/> + of the time reversal. In particular, tp + {A) = ifj + (A) when the Markov 
chain is reversible, so this is an extension of the results of Kannan et. al. [7]. 

Corollary 4.3 shows that ip + (l/2) > <I> 2 for lazy Markov chains, because <3? = <£> even for non- 
reversible Markov chains. This approximation applied to the second upper bound on r(e) is exactly a 
factor two from the non-reversible bound shown by Mihail [9]. A more direct approach can be found 
in [11] which recovers this factor of two. 

The inequalities ip gi(A) > ip + (A) and i^ moc i(A) — V ; + ( j 4) follow almost immediately from the 
definitions, so the theorem should not be used to lower bound either of these quantities by ip + (A). 

Nevertheless, the inequalities between ip terms given in the theorem are all sharp. The first two 
inequalities are sharp for a walk with uniform transition probability a/2 < 1 from A and all the flow 
concentrated in a region of size air(A) in A c . The final inequality is sharp as a limit. Let D — ► 4~, 
xo = (D/4) 2 / 3 , a = (4/D) — 1. Then put an 1 — xq fraction of A with P(-, A c ) = a/2 and the remainder 
with P(-,^4 C ) = 0. This flow can be concentrated in a small region of A c . 

Even though ip g i(A) is the largest quantity it is usually the least useful. As discussed in the 
introduction, when there are bottlenecks at small values of tt(A) then h + {x) is best (i.e., smallest) 
because of the conditioning on ir(A) £ [x/2,x]. Spread ip + (A) is also the easiest to compute, and 
the connection to ip evo (A) improves the constant terms in Theorem 2.1 greatly. For a "typical" case 
hmod{x) is better than h + (x), but h g i(x) is poor because the supremum in h g i(x) may occur for small 
tt(A). However, for graphs with extremely high node-expansion then h g i(x) may be best. As a case 
in point, on the complete graph K n we have r(l/4) < x 2 (l/4) = O(logn) via ip + or ip evo , while 
t(1/4) = O(loglogre) from Vw>d and t(1/4) = O(l) from ip g i. However, on the cube {0, l} ra , ipgi 
implies only r = 0(n2 n ), hopelessly far from the correct r = 0(n log n). 

The following lemma shows how to rewrite ^ (t, A c ) in terms of 7r(A n ) and will be key to our proof. 

Lemma 3.2. If M. is a lazy Markov chain and Ac K is a subset of the state space then 



Proof. We consider the case of t < tt(A). A similar argument implies the result when t > tt(A). 

When t = tt(A x ) for some x G [1/2, 1] then A x is the set of size t with the highest flow from A, so 
the smallest flow from A c , and therefore ^(t,A c ) = Q(A C ,A X ) considers the reversed chain, so it 




where w(t) 
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minimizes flow from A c rather than into A c ). But 

Q(A C ,A X ) = ^2 7r(y)P(y,A c ) = £ C (l - l^ (y>A) > tt ) du^y) 

yeA x y eA x Jo 

= I Y,{ l - l P{yA)>^)<y) du = I {<A x )-ir(A u nA x )) du 
J ° y£A x ' Jo 

= [ (ir(A x ) - ir(A u )) du = [ (t-ir(A u ))du. 

Jw(t) Jw(t) 

This gives the result if t = tt(A x ) for some x. Otherwise, the set B where the infinum is achieved in 
the definition of ^> (t, A c ) contains A w ^ + g where S — > + , and the remaining points y € B \ A w ^ + s 
satisfy P (y, A) = w(t). Let x = w(t) + 5, then w(ir(A x )) = w(t) for S sufficiently small and 

V(t,A c ) = Q(A X ,A C )+Q(B\A X ,A C ) 

= [ (ir(A x ) - Tr(Ai)) du + (* - tt(A x )) (1 - w(t)) 

Jw(tt(A x )) 

= [ (t - ir(A u )) du . 

Jw(t) 



□ 



Proof of theorem. Rewriting tp + (A) in terms of tt(A u ) gives 

4> + (A) = / \\. 9 ' dt = / / j^-dudt 

Jo AA) 2 Jo J w (t) 

•i /-(A) t _ n(A u ) jMj 1 /- 1 fir(A)-Tr(A u )\ 2 



f 1 [*W t - ir(A u ) 1 f 1 ( 



du 



where the first equality follows from the definition of tp + (A) , the second equality applies Lemma 
3.2, the third is a change in the order of integration using that w(t) < u iff tt(A u ) < t, and the 
final equality is integration with respect to t. Morris and Peres [11] used a Taylor approximation 
y/Tr(A u )/n(A) = \Jl + x < 1 + x/2 - (x 2 /8)5 x < for x = ir(A u )/ir(A) - 1, and the Martingale 
property of n(A u ) that Jq 1 ir(A u ) du = tt(A) (Lemma 6 of [11]), to derive the lower bound 

1 f 1 ( ir(A)-ir(A u ) \ 2 

The lower bound ip evo (A) > tp + (A)/4: follows. 
Similarly, 

♦7 , A . f 1 y(t,A c ) , 

^mod(A) > / / 1 1 dt 

Jo tir(A) 

r {A) f 1 tziV£ dudt i f 1 [ w{t) ~ 1 

Jo Jw(t) tir(A) Jn(A)Jo ttr(A) 



du dt 



' dtdu+ / 1 ', dtdu 



t<A) ' J J HA) tir(A) 



1/2 Jw(A u ) \ ) JO Jw(A) 



tt(A) & V <A) 
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The final equality used the Martingale property of n(A u ), as does the equality below. 



' {A) - 1! ( 




< I / ^-k>g( ) du < r niltll ( l)/2. 

where the inequality follows from 2{x — \fx) < xlogx for all x > 0. 

To establish that 2-0^ (^4) > tp mo d(A), observe that when t £ [ft(A), 1 — tt(A)] then the result is 

trivial, as l/min{i,l - t}vr(A) < l/vr(A) 2 . When t G [0,7r(A)] then let f(t) = ^(t,A c )/t. If B is 
the set where the infinum in ^ (t, A c ) is achieved then f(t) is the average probability of the reversed 
chain making a transition from a point in B to A c in a single step. It follows that f(t) is an increasing 
function, because as t increases the points added to B will have higher probability of leaving then any 
of those previously added. We then have the following: 



rTr(A) 2 ^{t,A c ) ^(t,A c ) r 
J [~^{Af ^(aTJ Jo 

- J 

J IT 



(2t-n(A))f(t)_ dt 



tt(A) 2 

< A ) (2t - njAX) {fit) - f(7r(A) - t)) 

tt(A)/2 k{A) 2 



dt > 0. 



A similar argument holds for the interval t G [1 — k(A), 1]. 

The first upper bound for r follow from 4 || • \\^y < || ■ Hx 2 (tt) an d Theorem 2.1. The second follows 
from this and x 2 ( e ) < (2ipevo (1/2) )~ 1 log(l/e7To), which is another bound of Morris and Peres [11]. □ 

The connection between the spread function and mixing quantities is deeper than just an upper 
bound on mixing time. In the proof that ip + bounds mixing time [7] it is shown that for reversible 
Markov chains there is some ordering of points in the state space K = [0,1] such that the mixing time 
is lower bounded by the case when tpcorrect(x) = tpcorrect([0,x]) = Q([x — t,x], [x, 1]) dt. The most 
pessimistic lower bound on *p C orrect(x) is ip + (x), hence an upper bound on mixing time, whereas 

rA A ) . (t A c ) 

i>big(A)= 7' 2 L dt where V Mg (t,A c )= sup Q(B,A C ) (1) 

n{B)=t 

is the most pessimistic upper bound on ipcorrect(A) when A = [0, x], i.e., ipug(x) > ip C orrect(x) > ip + {x). 
The following theorem shows that this ordering carries over to mixing time and spectral gap, with 
ipbig appearing in a lower bound on mixing time and in an upper bound on spectral gap. 

Theorem 3.3. If M. is a lazy, aperiodic, ergodic reversible Markov chain then 

4^(1/2) >A> jV + (l/2). 

The lower bound on A, when combined with tp + (l/2) > <I> 2 is a factor two from the well known 
A > $ 2 /2 [13]. 
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Proof. The upper bound for r(e) follows from the previous theorem. 

For the lower bound on A we need some information about the proofs of certain useful facts. First, 

r( e )>i(l-A)A- 1 ln(2 e )- 1 (see [13]) (2) 

can be proven by first showing that c(l — A)* < sup p (o) ||p^ — it\\tv for some constant c. Second, 
the proof of the ipevo part of Theorem 2.1 can be easily modified to show that sup p (o) ||p^ — 7t||tv < 
C2 (1 — if} evo (l/2)) t for some constant ci- It follows that c(l — A)* < C2 (1 — ip evo (l/2)) t , and taking 
t — > oo implies further that 1 — A < 1 — ip evo (l/2). The result then follows by ijj evo (A) > \ ip + (A). In 
words, this says that the asymptotic rate of convergence of total variation distance is at best 1 — A 
and at worst 1 — ip evo (l/2), and therefore 1 — A < 1 — ip evo (l/2). 

For the upper bound on A suppose that K D A = [0, x] where x = n(A) < 1/2, and order vertices 
in A by increasing P(t, A c ). Then 

, (A) + lh+(A) r^ Q([x-t,x},[x,l}) ^) QQD4M) 

Q(A,A C 



tt(A) 



HA) 



It follows that <f>(A) > 4> bi g(A) > <Z>(A)/2. The upper bound on A then follow from A < 2$ [13]. 

The lower bound on mixing time follows from the upper bound on the spectral gap and the lower 
bound on mixing time given in (2). □ 

It would be interesting to know if the lower bound on the mixing time can be improved. The 
barbell consisting of two copies of K n joined by a single edge is a case where r(l/4) < l/^ + (l/2), 
which shows that ip + cannot replace ipbig in the lower bound. However, in those examples where we 
know the answer we find that r(l/4) > c J dx/ip + (x). 



4 Discrete Gradients 

In this section we look at the discrete gradients hp (A) of Houdre and Tetali [5]. This is a family 
that extends the ideas of edge and vertex-expansion, with hf(A) measuring edge-expansion, h^(A) 
measuring vertex-expansion and hf(A) a hybrid. We use the hp notation here, despite the similarity 
to the h g i/ mod notation earlier, to be consistent with [5, 7]. 

Definition 4.1. Let M be a Markov chain. Then for p > 1, A C K the discrete p- gradient h^{A) is 

The (often larger) h~ (A) is defined similarly, but with Q P (A C ,A) rather than Q P (A,A C ). 

These can be extended to p = oo in the natural way, by taking Q 00 (C, D) = ir({u £ C : Q(u, D) ^ 
0}). We sometimes refer to h^{x) = inf^^^j. h^{A). 

The main focus of this section will be h^{A) and h^iA) which are hybrids of edge and vertex- 
expansion as Cauchy- Schwartz shows, h^A) 2 < h^ (A) hf(A). Note that h^{A) can be significantly 
larger than h^iA), which is why our theorems below differ in the plus and minus cases. In contrast, 
conductance bounds only have one form, for $(A) = hf(A) = h±(A). 
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In this section it will be shown how discrete gradients can be used to upper and lower bound the 
chain of inequalities given in Theorem 3.1. It is not necessary to prove a theorem for the time-reversal 
because, for instance, bounds on ip + apply to ip + as well by bounding ip + of the time-reversed Markov 
chain. If we let ip~ (A) be defined as 

*"(A)-< 



tt(A) k(A) 2 

then ipgi = tp + (A) + ip~(A), so upper and lower bounds on tp^^A) imply upper and lower bounds on 
tpgi(A) as well. Our main result of this section is the following theorem. 

Theorem 4.2. Given a (non-reversible) Markov chain M. with state space K and a set A C K, let 

P(u v) 

P* = 1 — inf ugj 4 P(u, A) and P m in = hrf — . If it (A) < \ then 

•j,eA,v£A c , tt(v) 



P(u,v)>0 



and > ^hf(A) 2 

In practice it may be useful to upper bound the log terms either by log(12 P*/P m j n ) for ip^ , or 
with log(12/K%(A) 2 ) for the ip + (A) case and log(12/vr(^) h 2 (A) 2 ) for tp~(A). These follow by the 
identities hf(A) < h^(A) P* and hf(A) > h^(A) for the first type, or by hf(A), h+,(A) < 1 

and h^(A) < vr(yl) _1 for the latter. 

All upper and lower bounds scale properly in P, e.g., if P is slowed by a factor of 2 to P — ► | (I+P) 
then ^(A) and all the bounds in Theorem 4.2 change by the same factor of 2. Moreover, if P(-,^4 C ) 
is constant over a set A then the upper bounds are sharp, while the lower bounds are within a small 
constant factor. 

Our methods also extend to the other discrete gradients hp . The most interesting cases are p = 1 
and p = oo. 

Theorem 4.3. Given the conditions of Theorem 4-2 then 



\ hf(A) ht(A) > ^(A) > max |i P mm ht(A) 2 , 



The h 2 type bounds are the most appealing of the p-gradient bounds because the upper and lower 
bounds are the closest. For instance, if C = hf(A) h^ Q (A)/h 2 (A) 2 then the gap between the upper 
and lower bounds for hf or /i+ is at least C (since \ h\ /i+ > \ h 2 2 > ip + (A) already gives C in 
the first inequality), whereas the gap between the upper and lower bounds in terms of h 2 2 is at most 
log(12 C), typically a much smaller quantity. Moreover, the upper bound in terms of h 2 (A) 2 is tighter 
than the upper bound for any p / 2, as can be proven via Cauchy-Schwartz. 

The lower bounds on ^(A) for p ^ 2 can be considered as approximations of ^ h 2 2 , or a bit more 
loosely, as approximations of \h£ h^ . The Jerrum- Sinclair type bound V + (V 2 ) > \ P~ 1 h+ 2 (l/2) 
is the natural approximation to \h 2 2 in terms of h±, while the Alon type bound ip + (l/2) > 
\ Pmin ^i> 2 (l/2) is natural for . It is too much to expect the upper and lower bounds to match, so 
the extra log term in the case of p = 2 is not much of a penalty. 

Let us now look at the sharpness of Theorem 4.2. 
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Example 4.4. Consider the natural lazy Markov chain on the complete graph K n given by choosing 
among vertices with probability ^- each and holding with probability ^ (so P(x,x) = \ (1 + — ) and 
for y / x : P(x, y) = ^). If A C K with size x = ir(A) < 1/2 it follows that \fx e A : P(x, A c ) = 
and therefore /i^"(x) = ^ (1 — x)/2 and 

1 X . I , N 1 3/ 1 37 
> ^ (» > ; 77 rr > 



4 log(24/(l - x)) ~ 16 ' 

t ^ ~~ X 

where we have used that hf(A)h^ CJ (A) < 1. The correct answer is ip + (x) = — §— dt = (1 — x)/4. 

Likewise, Vx £ i c : P(x,^4) = | and so h^ix) = (1 — x)/y/2x, which combined with ip g i(A) = 
ip + (A) + tp~(A) and the bounds in Theorem 4.2 implies that 

1 — x 1 — x (1 — x) 2 1 — X 

4x - ~ 4 log(24/(l -x)) + 4x log(24/(l - x) 2 ) ~ 17x ' 

while the correct answer is 

Theorem 3.1 with = 1/32 (as found above with h^) implies mixing in x 2 (l/4) < 64(logn + 
log 4), while Theorem 3.3 leads to a spectral bound of A > 1/128, which are correct orders for % 2 and 
A. With the tp g i lower bound found above we also have the correct r(l/4) = 0(1). 

This example shows that for Markov chains with very high expansion the bounds on ip g i(A) given 
by Theorem 4.2 can lead to very good mixing time bounds. However, few Markov chains have such 
high expansion, and so in future examples we deal only with /i^~(x) and ip + (A). 

The sharpness of the lower bound depends on the sharpness of the log(hf(A) h^(A) / (A) 2 ) term 
in the denominator. We give here an example in which the lower bound is tight, up to a factor of 1.6, 
for every ratio h\ (^4) 2 / 'h\ (A) /j+ (A) and every set size it (A). 

Example 4.5. Let the state space K = [0,1] and fix some e < 1/2. For ease of computation 
we consider this continuous space, but the results of this example apply to finite spaces as well by 
dividing K = [0, 1] into states (intervals) of size 1/n and taking n — > oo. 

If t £ [0, 1/2] then consider the reversible Markov chain with uniform stationary distribution on 
[0, 1] given by the transition densities 

^ = ^1 = ( 1 2 ^ £ when * € fo, I) and y e (1,1 

dy dt \(e/t) 2 if e<t< 1/2 \ 2/ V2 

holding with the remaining probability. Then, when A = [0, x] C [0, 1/2] it follows that 



V(t,A c )= f P(y,A c )dy 

Jx-t 



f^jshy l ft<x-e 
x-i""""'"" l^ + ^F 11 ift€{x-e,x] 



Some computation shows that tp + (A) = + log(x/e)), hf(A) = ^(2 - e/x), h^{A) = ^^(1 + 

log(x/e)) and h^ Q (A) = 1. This leads to the relation 



^ 2 i + log(x/e) h+(Af 



^+{A)=h+{A) 2 2 o < ',,,, ■ ( 3 ) 

2K (l + log(x/e)) 2 l + log(x/e) 
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Let j3 € (0, 1] be such that 

h+{A) 2 e (l + log(z/e)) 2 



x' ' h+{A) x 2-e/x 

Solving for e shows that e G (0, x] is the unique solution to (3 = jz^^j^j ■ For every (3 this gives a 

Markov chain and an e such that e/x = (3 2 h\ (A) 2 / hf (A) . Letting j3 = e~ k / 2 for an arbitrary k, and 
substituting this e into (3) gives an example with 

. h+(A) 2 h+{A) 2 

* [ j< l + log(x/ e) fe + log( ^ + )^) ) ' 

This shows that no constant k in the denominator will suffice to replace the 1/2 in Theorem 4.2. 
Moreover, for every fixed x, as j3 (or equivalently, as e) varies the ratio , , = f j3~ 2 = 

{A) hoc \A) 

| ^— ^r^— - varies over the complete range (0,1]. Therefore, for all set sizes x and all ratios 
K^ 2 jh\ /i+ the lower bound in the theorem is within a factor 21og(12) ~ 5 of optimal. In fact, 
if the a quantity in the proof (see below) is optimized then the a form is within a factor 1.6 of 
optimal. 

Proof of Theorem J±.2. We give the proof for the ip + (A) case. The if>~ (A) case is similar. 
First let us simplify the terms in the theorem. 

Without loss, assume the state space is K = [0,1] and A = [0,x] where x := n(A). Order the 
points in A in decreasing exit probability, so that y,z G A : y < z =>■ P(y,A c ) > P(z, A c ). Then 



VtG [0,x] :*(t,A c )= f P(y,A c )dy. 

Jx-t 



It follows that 



iP + (A)it(A) 2 = / *(t,A c )dt = / P(y,A c )dydt = yP(y,A c )dy, 

JO JO Jx-t Jo 

where the last equality comes from changing the order of integration. 
Observe also that 

Q 2 (A,A C ) = V y/P(v, A-)tt(v) = f ^P(t^)dt . 

VGA J ° 

We begin with the upper bound on ip + (A). 

i> + (A)K(A) 2 = f X ^/P(t,A c )t^/P(t,A c )dt 
Jo 

< r v / P(^) / y/P(y,A°)dydt 
Jo Jo 

= i(jTVPM=)*)' 

where the inequality is due to P(t, A c ) being non-increasing, and the final equality follows from 

/ / ^P(t^)^P(y,A-)dydt = / / ^/P{t~A-) y/P(y,A*)dtdy 
Jo Jo Jo Jt 
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by changing the order of integration. 

Now the first lower bound on ip + (A). For any e € [0,x] 

re r Qoo(A,A c ) 

Q 2 (A,A C ) = J l-y/P(t,A°)dt + J y/tP{t,A c )/Vtdt 

< y/eQ^A^ + J J tP(t,A°)dt J — 

< ^eQ l {A,A-) + y/l;+(A)7r(A)2 logiQ^A, A<)/e) 
where the first inequality is by Cauchy- Schwartz. It follows that 

(Q 2 (A,A c )- y /eQ 1 (A,A c )) 2 

ib + ( A\ > - — 

W K >- 7r(A)2log(Q 00 (A^ c )A) ' 

Letting e = a 2 Q2(A, A c ) 2 /Qi(A, A c ) completes the lower bound. The bound stated in the theorem 
follows by setting a = 1 — l/y/2. This is a refinement of an argument of Morris and Peres [11]. 

The second lower bound will be worked out for the case of general p-gradient lit (A), so that 
Theorem 4.3 follows easily as well. It follows from the definitions that if B C A then Q(A \ B,A C ) > 
^JJ P Qp(A \ B, A<) > P l JJ p (Q P (A, A<) - 7r(-B) ^). Then 

^ + {A)n{A) 2 = [ X Q{[x-t,x],A c )dt 
Jo 

> pl JJ P f X max {°> Qp(± A °) - t dt 
Jo 



1 p 



1-1/p 

rain r\ ( \ ac\2 



Q P (A,A C 



2 ^/P~, 

□ 

Proof of Theorem 4-3. The upper bound for ifr + (A) follows by applying Cauchy-Schwartz to show 
h^iA) 2 < hf(A) h^ Q (A), and then substituting this into Theorem 4.2. The lower bound follows from 
choosing the appropriate p-gradients in the final section of the proof of Theorem 4.2. The bounds for 
ip~(A) are proven similarly. □ 



5 A Grid Walk 

The previous two sections have shown that the discrete gradients provide a nice extension of past 
isoperimetry results, and that the bounds provide relatively tight upper and lower bounds on 
tpgl(A) and tp + (A). In this section we provide an application, a near-optimal result on a random walk 
on the binary cube 2 n , and more generally on the grid [k] n . 

The quantity h^x) has been studied in the theory of concentration of measure. In particular, 
Talagrand [14] has shown that 

From this and Theorem 4.2 we obtain the following result. 
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Corollary 5.1. The mixing time of the lazy random walk on the cube {0, 1}" is 

t = O (n log 2 n) . 

This method (also used by Montenegro [10] and Morris and Peres [11]) gives the first isoperimetric 
proof that r is quasi-linear for this particular random walk. In fact, the bound can be improved to 
r = 0(n logn) because the proofs of both bounds in Theorem 2.1 only require that some expanding 
sequence of sets be considered. When p(°) is a point (the worst case) then these sets are "fractional 
hamming balls" in the blocking conductance case, and hamming balls in the evolving set case. In both 
cases a modified quantity of the form 



-log7r(^)vr(^ c 



n 



can be found [10, 11]. Unfortunately neither method extends to even something as similar as the grid 
[k] n , as the level sets are no longer Hamming balls. 

We now give a proof extending Talagrand's result to the case of the grid [k] n . We will require the 
isoperimetric quantity 

Ack tt(A)tt(A c ) Ack v ' tt(A c ) 

studied by Murali [12]. 

Theorem 5.2. Suppose that K n = K\ x K2 x • • • K n is a Cartesian product of Markov chains. Then 



n 



Houdre and Tetali [5] studied h% on products and found that /i^(i^T n ) > 2 J Qn min/i^(ivTj). The 
advantage of our theorem is that it considers set sizes as well, which is important for studying mixing 
time. 

Proof. The proof will show a chain of inequalities relating isoperimetric quantities introduced by 
various authors. Our main interest is to lower bound h^x), so rather than go into details of these 
quantities we will simply give definitions and state the inequalities we need. The interested reader can 
learn more about these quantities in [4, 3, 12]. 

Bobkov's constant 6+ is defined to be the largest constant such that for all / : X — > [0, 1] 

L ( (Ef) < E^mf) + (D+f) 2 /bi 2 , 
where I 7 (x) is the so-called Gaussian isoperimetric function, 

L f (x) =i P o^- 1 ( x ) where ip(t) = e^' 2 and £(t) = / e^ 2 / 2 dy , 

V27T V27T J -00 

which makes its appearance in many isoperimetric results, and where 

i/p 



Dtf 



{{f{x)-f{y)) + f P(x,y)dy 
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Note that J 7 (0) = i 7 (l) = 0, J 7 (x) > x(l - x) v / log(l/x(l - x)), and if A C if then E (D+1 A ) = 
Q P (A, A c ). The test functions f = 1a thus show that 

b+ < Q P (A,A C ) < 2fe+Q4) 
P " / 7 (vr(A)) " ^-logvr^vr^) ' 



so in a sense Bobkov's constant is a functional form of inf^x hp(A)/y/— log 7r(^4)7r(^4 c ), just as the 
spectral gap A can be considered a functional form of the conductance <£. 
It is known that Bobkov's constant b\ tensorizes as 

b+(K n ) = ^ r minb+(K l ) . 

V n< 

Lower bounding 6^(Kj) is difficult. One method is to use b^-fQ) > bf(K,i), which follows easily 
from the definitions. In her Ph.D. Dissertation Murali [12] showed that bf(Ki) > /3 + (iQ). We then 
have the following chain of inequalities: 

2 "' J {X) > b+(K n ) = -L min 6+ (if;) > -L min 6+ (if;) > -L min /?+(if;) . 



v / -logx(l - x) ~ \fn ~ ^/n ~ \fn 

□ 

Theorem 5.2 is unlikely to prove of much use for Markov chains, other than ones with relatively 
small state spaces, because (3 +2 = inf^^ <3?(A) 2 — wm ^ e extremely small unless ttq is 

not too small. However, if a better method is found to lower bound Bobkov's constant bf(K) or 
b 2 (if) then the theorem, with [5 + {Ki) replaced by b^(Ki) or b£ (if), could prove useful for general 
tensorization results. Nevertheless, the (3 + method suffices for our purposes here. 

Corollary 5.3. The mixing time of the lazy random walk on the grid [k] n is 



O (k 2 n log 2 



n 



Proof. The lazy random walk on the line [k] satisfies $(x) > l/(4kx). It follows that (5 +2 > 1/4A; 2 . 
By Theorem 5.2 h^ipc) > ^r^ \J~ logx(l — x). Combining this with Theorems 2.1 and 4.2, and the 

simplification hl ^r^r~^ < discussed after Theorem 4.2 gives r = Oik 2 n logn log log k n ). 

II2 (A) * min 

The log log k n term can be improved to log n because of "ultracontractivity" of geometric Markov 
chains, such as this one on [k] n , which implies that ttq may be taken as 2~ n instead of k~ n with only 
the addition of a small constant factor. [?] □ 

This is the first isoperimetric proof of r = 0*(k 2 n) for this Markov chain, and is quite close to the 
correct r = Q(k 2 n logn). 
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